AITopics | character-level model

Collaborating Authors

character-level model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DeepMath - Deep Sequence Models for Premise Selection

Geoffrey Irving, Christian Szegedy, Alexander A. Alemi, Niklas Een, Francois Chollet, Josef Urban

Neural Information Processing SystemsApr-30-2026, 23:08:19 GMT

We study the effectiveness of neural sequence models for premise selection in automated theorem proving, one of the main bottlenecks in the formalization of mathematics. We propose a two stage approach for this task that yields good results for the premise selection task on the Mizar corpus while avoiding the handengineered features of existing state-of-the-art models. To our knowledge, this is the first time deep learning has been applied to theorem proving on a large scale.

conjecture, logic & formal reasoning, machine learning, (19 more...)

Neural Information Processing Systems

Country: Europe (0.68)

Genre:

Instructional Material (0.46)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsFeb-6-2025, 14:26:18 GMT

The paper presents a character-level convolutional network architecture and applies it to eight text classification problems on large datasets that the authors construct. It also presents comparative results from several word-based deep NN models as well as bag-of-ngrams models. The character-level ConvNets outperform word-based models on four out of eight datasets, when word-based data augmentation is used. The clarity and quality of writing are ok but the presentation of the method and results could have been much more clear. There are numerous grammatical and spelling errors.

author feedback and meta-review, character-level model, data augmentation, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.39)

Add feedback

Character-level NMT and language similarity

Jon, Josef, Bojar, Ondřej

arXiv.org Artificial IntelligenceAug-8-2023

We explore the effectiveness of character-level neural machine translation using Transformer architecture for various levels of language similarity and size of the training dataset on translation between Czech and Croatian, German, Hungarian, Slovak, and Spanish. We evaluate the models using automatic MT metrics and show that translation between similar languages benefits from character-level input segmentation, while for less related languages, character-level vanilla Transformer-base often lags behind subword-level segmentation. We confirm previous findings that it is possible to close the gap by finetuning the already trained subword-level models to character-level.

machine learning, natural language, translation, (17 more...)

arXiv.org Artificial Intelligence

2308.04398

Country:

Europe > United Kingdom > UK North Sea (0.05)
Atlantic Ocean > North Atlantic Ocean > North Sea > UK North Sea (0.05)
Europe > Italy > Tuscany > Florence (0.04)
(12 more...)

Genre:

Research Report (0.50)
Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

What is the best recipe for character-level encoder-only modelling?

Cao, Kris

arXiv.org Artificial IntelligenceMay-9-2023

This paper aims to benchmark recent progress in language understanding models that output contextualised representations at the character level. Many such modelling architectures and methods to train those architectures have been proposed, but it is currently unclear what the relative contributions of the architecture vs. the pretraining objective are to final model performance. We explore the design space of such models, comparing architectural innovations and a variety of different pretraining objectives on a suite of evaluation tasks with a fixed training procedure in order to find the currently optimal way to build and train character-level BERT-like models. We find that our best performing character-level model exceeds the performance of a token-based model trained with the same settings on the same data, suggesting that character-level models are ready for more widespread adoption. Unfortunately, the best method to train character-level models still relies on a subword-level tokeniser during pretraining, and final model performance is highly dependent on tokeniser quality. We believe our results demonstrate the readiness of character-level models for multilingual language representation, and encourage NLP practitioners to try them as drop-in replacements for token-based models.

computational linguistic, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.05461

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > United Kingdom > England > Greater London > London (0.14)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
(8 more...)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

DeepMath - Deep Sequence Models for Premise Selection François Chollet

Neural Information Processing SystemsApr-10-2023, 11:22:59 GMT

architecture, conjecture, theorem, (14 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.05)
South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
North America > Canada > Quebec > Montreal (0.04)
(4 more...)

Genre:

Instructional Material (0.46)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)

Add feedback

Subword-Delimited Downsampling for Better Character-Level Translation

Edman, Lukas, Toral, Antonio, van Noord, Gertjan

arXiv.org Artificial IntelligenceDec-2-2022

Subword-level models have been the dominant paradigm in NLP. However, character-level models have the benefit of seeing each character individually, providing the model with more detailed information that ultimately could lead to better models. Recent works have shown character-level models to be competitive with subword models, but costly in terms of time and computation. Character-level models with a downsampling component alleviate this, but at the cost of quality, particularly for machine translation. This work analyzes the problems of previous downsampling methods and introduces a novel downsampling method which is informed by subwords. This new downsampling method not only outperforms existing downsampling methods, showing that downsampling characters can be done without sacrificing quality, but also leads to promising performance compared to subword models for translation.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2212.01304

Country: North America > United States > Pennsylvania (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.51)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)

Add feedback

How Good is Your Chatbot? An Introduction to Perplexity in NLP

#artificialintelligenceMar-5-2022, 06:55:59 GMT

New, state-of-the-art language models like DeepMind's Gopher, Microsoft's Megatron, and OpenAI's GPT-3 are driving a wave of innovation in NLP. How do you measure the performance of these language models to see how good they are? In a previous post, we gave an overview of different language model evaluation metrics. This post dives more deeply into one of the most popular: a metric known as perplexity. Surge AI delivers better data, faster.

dataset, perplexity, probability, (16 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Understanding Perplexity Metrics in Natural Language AI

#artificialintelligenceDec-17-2021, 19:30:59 GMT

dataset, perplexity, probability, (15 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Analyzing the Use of Character-Level Translation with Sparse and Noisy Datasets

Tiedemann, Jörg, Nakov, Preslav

arXiv.org Artificial IntelligenceSep-27-2021

This paper provides an analysis of character-level machine translation models used in pivot-based translation when applied to sparse and noisy datasets, such as crowdsourced movie subtitles. In our experiments, we find that such character-level models cut the number of untranslated words by over 40% and are especially competitive (improvements of 2-3 BLEU points) in the case of limited training data. We explore the impact of character alignment, phrase table filtering, bitext size and the choice of pivot language on translation quality. We further compare cascaded translation models to the use of synthetic training data via multiple pivots, and we find that the latter works significantly better. Finally, we demonstrate that neither word-nor character-BLEU correlate perfectly with human judgments, due to BLEU's sensitivity to length.

machine translation, proceedings, translation, (15 more...)

arXiv.org Artificial Intelligence

2109.13723

Country:

Europe > Czechia > Prague (0.05)
North America > United States > New York > Monroe County > Rochester (0.04)
North America > Canada > Quebec > Montreal (0.04)
(23 more...)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Deep Search Query Intent Understanding

Liu, Xiaowei, Guo, Weiwei, Gao, Huiji, Long, Bo

arXiv.org Artificial IntelligenceAug-18-2020

Understanding a user's query intent behind a search is critical for modern search engine success. Accurate query intent prediction allows the search engine to better serve the user's need by rendering results from more relevant categories. This paper aims to provide a comprehensive learning framework for modeling query intent under different stages of a search. We focus on the design for 1) predicting users' intents as they type in queries on-the-fly in typeahead search using character-level models; and 2) accurate word-level intent prediction models for complete queries. Various deep learning components for query text understanding are experimented. Offline evaluation and online A/B test experiments show that the proposed methods are effective in understanding query intent and efficient to scale for online search systems.

information retrieval, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2008.06759

Country:

Europe > Ireland > Connaught > County Galway > Galway (0.05)
North America > United States > California > Santa Clara County > Mountain View (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Services (0.93)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback